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Abstract. Recently, different works proposed a new way to mine pat- 
terns in databases with pathological size. For example, experiments in 
genome biology usually provide databases with thousands of attributes 
(genes) but only tens of objects (experiments). In this case, mining the 
"transposed" database runs through a smaller search space, and the Ga- 
lois connection allows to infer the closed patterns of the original database. 
We focus here on constrained pattern mining for those unusual databases 
and give a theoretical framework for database and constraint transpo- 
sition. We discuss the properties of constraint transposition and look 
into classical constraints. We then address the problem of generating the 
closed patterns of the original database satisfying the constraint, starting 
from those mined in the "transposed" database. Finally, we show how to 
generate all the patterns satisfying the constraint from the closed ones. 



1 Introduction 

Frequent pattern mining is now well mastered, but these patterns, like associa- 
tion rules, reveal to be too numerous for the experts and very expensive to com- 
pute. They have to be filtered or constrained. However, mining and constraining 
have to be done jointly (pushing the constraint) in order to avoid combinatorial 
explosion [14j . Mining under complex constraint has become today a hot topic 
and the subject of numerous works (e.g., |14I7I16I20I10I8| ). Moreover, new do- 
mains are interested in our applications, and data schemes vary consequently. In 
genome biology, biological experiments are very expensive and time consuming. 
Therefore, only a small number of these experiments can be processed. However, 
thanks to new devices (such as biochips) , experiments can provide the measure- 
ments of the activity of thousands of genes. This leads to databases with lots of 
columns (the genes) and few rows (the experiments). 

Numerous works present efficient algorithms which mine the patterns satis- 
fying a user defined constraint in large databases. This constraint can combine 
minimum and maximum frequency threshold together with other syntactical 
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constraints. These algorithms are designed for databases with up to several mil- 
lions of rows. However, their complexity is exponential in the number of columns 
and thus they are not suited for databases with too many columns, like those 
encountered in genome biology. 

Recently, two propositions were done to solve this problem: instead of mining 
the original database, these algorithms work on the "transposed" database, i.e., 
columns of the original database become rows in the "transposed" database and 
rows becomes columns (this is indeed the same database but with a different rep- 
resentation) . Therefore the "transposed" database has significantly less columns 
than the original one. The CARPENTER algorithm [18j is specifically designed 



for mining the frequent closed patterns, and our proposition |23l24j uses a clas- 
sical algorithm for mining closed patterns with a monotonic (or anti-monotonic) 
constraint. Both approaches use the transposition principle, however the problem 
of mining under constraints is not fully studied, specially for complex constraints 
(i.e., conjunction and disjunction of simple constraints). 

In this paper, we study this problem from a theoretical point of view. Our 
aim is to use classical algorithms (constrained pattern mining algorithms or 
closed patterns mining algorithms) in the "transposed" database and to use 
their output to regenerate patterns of the original database instead of directly 
mining in the original database. 

There are several interesting questions which we will therefore try to answer: 

1. What kind of information can be gathered in the "transposed" database on 
the patterns of the original database ? 

2. Is it possible to "transpose" the constraints ? I.e., given a database and a 
constraint, is it possible to find a "transposed" constraint such that mining 
the "transposed" database with the "transposed" constraint gives informa- 
tion about the patterns which satisfy the original constraint in the original 
database ? 

3. How can we regenerate the closed patterns in the original database from the 
patterns extracted in the "transposed" database ? 

4. How can we generate all the itemsets satisfying a constraint using the ex- 
tracted closed patterns. 

These questions will be addressed respectively in Sec. [21 131 HlandlHl 

The organization of the paper is as follows: we start Sec. [2] by recalling 
some usual definitions related to pattern mining and Galois connection. Then 
we show in Sec. |3] how to transpose usual and complex constraints. Section [4] 
is a complete discussion about mining constrained closed patterns using the 
"transposed" database and in Sec. O we show how to use this to compute all 
(i.e., not only closed) the patterns satisfying a constraint. Finally Sec. [6] is a 
short conclusion. 

2 Definitions 

To avoid confusion between rows (or columns) of the original database and rows 
(columns) of the "transposed" database, we define a database as a relation be- 
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tween two sets : a set of attributes and a set of objects. The set of attributes 
(or items) is denoted A and the set of objects is O. The attribute space 2-^ 
is the collection of the subsets of A and the object space 2*-' is the collection 
of the subsets of O. An attribute set (or itemset or attribute pattern) is a 
subset of A. An object set (or object pattern) is a subset of O. A database 
is a subset of ^ x C 

In this paper we consider that the database has more attributes than ob- 
jects and that we are interested in mining attributes sets. The database can be 
represented as an adjacency matrix where objects are rows and attributes are 
columns (original representation) or where objects are columns and attributes 
are rows (transposed representation). 



Table 1. Original and transposed representations of a database. The attributes 
are A — {01,02,03,04} and the objects are O = {01,02,03}. We use a string 
notation for object sets or itemsets, e.g., 010304 denotes the itemset {oi, 03, 04} 
and 02O3 denotes the object set {02, 03}. This dataset is used in all the examples. 
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2.1 Constraints 

Given a database, a constraint C on an attribute set (resp. object set) is a 
boolean function on 2-^ (resp. on 2^). Many constraints have been used in pre- 
vious works. One of the most popular is the minimum frequency constraint which 
requires an itemset to be present in more than a fixed number of objects. But 
we can also be interested in the opposite, i.e., the maximum frequency con- 
straint. Other constraints are related to Galois connection (see Sect. 12. 2p . such 
as closed |2i patterns, free [6], contextual free [7] or key [2] patterns, or even 
non-derivable [S] or emergent [25111] patterns. There are also syntactical con- 
straints, when one focuses only on itemsets containing a fixed pattern (superset 
constraint), contained in a fixed pattern (subset constraint), etc. Finally, when 
a numerical value (such as a price) is associated to items, aggregate functions 
such as sum, average, min, max, etc. can be used in constraints |16) . 

A constraint C is anti-monotonic if \fA,B {A C B A C{B)) =4> C{A). 
A constraint C is monotonic if VA,B {A C B A C{A)) =^ C{B). In both 
definitions, A and B can be attribute sets or object sets. The frequency constraint 
is anti-monotonic, like the subset constraint. The anti-monotonicity property is 
important, because level- wise mining algorithms most of time use it to prune 
the search space. Indeed, when a pattern does not satisfy the constraint, its 
specialization neither and can be pruned [I]. 
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Simple composition of constraints has good properties: the conjunction or 
the disjunction of two anti-monotonic (resp. monotonic) constraints is anti- 
monotonic (resp. monotonic). The negation of an anti-monotonic (resp. mono- 
tonic) constraints is monotonic (resp. anti-monotonic). 

2.2 Galois Connection 

The main idea underlying our work is to use the strong connection between 
the itemset lattice 2-^ and the object lattice 2'-' called the Galois connection. 
This connection was first used in pattern mining when closed itemset mining 
algorithms were proposed [12], while it relates to many works in concept learn- 
ing [T7I27] . 

Given a database db, the Galois operators / and g are defined as: 

— /, called intension, is a function from 2'^ to 2^ defined by 

f{0) = {aeA\yoeO, (a, o) G db} , 

— g, called extension, is a function from 2-^ to 2''^ defined by 

g{A) = {oeO\yaeA, {a, o) e db} . 

Given an itemset A, g{A) is also called the support set of A in db. It is also 
the set of objects for which all the attributes of A are true. The frequency of 
A is |(?(^)| and is denoted J^{A). 

Both functions enable us to link the attribute space to the object space. 
However, since both spaces have not the same cardinality, there is no one to 
one mapping between thenH. This means that several itemsets can have the 
same image in the object space and conversely. We thus define two equivalence 
relations Va and on 2'-' and 2^: 

— if A and B are two itemsets, AvaB if g{A) — g{B), 

— if O and P are two sets of objects, O Tq P if f{0) = f{P). 

In every equivalence class, there is a particular element: the largest (for inclu- 
sion) element of an equivalence class is unique and is called a closed attribute 
set (for Ta ) or a closed object set (for Tq ). 

The Galois operators / and g lead by composition to two closure operators, 
namely h = f og and h' = go f. They relate to lattice or hypergraph theory and 
have good properties [26] . The closed sets are then the fixed points of the closure 
operators and the closure of a set is the closed set of its equivalence class. In the 
following we will indifferently refer to h and h' with the notation cl. We denote 
Cciose the constraint which is satisfied by the itemsets or the object sets which 
are closed. 

If two itemsets are equivalent, their images are equal in the object space. 
There is therefore no mean to distinguish between them if the mining of the 
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closed patterns is performed in the object space. So, by using the Galois con- 
nection to perform the search in the object space instead of the attribute space, 
we will gather information about the equivalence classes of (identified by 
their closed pattern), not about all individual itemsets. This answers the first 
question of the introduction, i.e. what kind of information can be gathered in 
the transposed database on the patterns of the original database. At best, we 
will only be able to discover closed patterns. 




(a) g (b) 

Fig. 1. The equivalence classes for in the itemset lattice (a) and for rg in the 
object set lattice (b) built on the database of Tab. [1] The closed sets are in bold 
face. The arrows represent the / and g operators between the 010203 and O1O2 
equivalence classes. The dotted arrows represent the closure operators h and h' 



Property 1. Some properties of / and g. 

— f and g are decreasing w.r.t. the inclusion order: if A C_ B then g{B) C g{A) 
(resp. f{B) C f{A)) 

— If A is an itemset and O an object set, then g{A) is a closed object set and 
f{0) a closed itemset 

— fixed point: A is closed if and only if f{g{A)) ~ c\{A) — A (resp. g{f{0)) = 
c\{0)^0) 

— f °9° f = f and go f og = g 
-AC c\{A) 

In the Galois connection framework, the association of a closed pattern of 
attributes and the corresponding closed pattern of objects is called a concept. 
Concept learning [17127] has led to classification tasks and clustering processes. 
We use this connection in this article through the link it provides between the 
search spaces 2-^ and 2'^. 

Example 1. In Fig. [l] the closed objects sets are 0, 03, 01O2, and 01O2O3. The 
closed itemsets are 02O3, 02O3O4, 010203 and 01O2O3O4. Since 5(0102) = 01O2O3 
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and f{aia2a3) = 01O2, (010203, 01O2) is a concept. The others are (0203, 01O2O3), 
(020304,03), (01020304,0). 

Closed sets of attributes are very useful for algorithms with support con- 
straint, because they share, as maximal element of the equivalence class Va , 
the same frequency with all patterns in the class. Closed set mining is now well 
known [12], and frequent closed patterns are known to be less numerous than 
frequent patterns |5|9j . Today's approaches relate to closed sets with constraints 
mining [3] . These patterns are good candidates for constituting relevant concepts, 
which associate at the same time the attributes and the objects. For example, 
biologists want to constraint their search to attribute patterns containing some 
specific genes, with a specified maximum length. They also will be interested in 
analyzing the other part of the concept. We specifically address here the problem 
of constrained closed mining in databases with more attributes than objects. 

3 Constraint Transposition 

Most algorithms extracting closed patterns are search algorithms. The size of 
the search space strongly determines their performance [12) . In our context, the 
object space 2'-' is smaller than the attribute space 2"^. We therefore choose to 
search the closed patterns in the smaller space (2'-') by transposing the database. 
In order to mine under constraint, we study in this section how we can adapt con- 
straints to the new transposed database, i.e., how we can transpose constraints. 
We will therefore answer question 2 of the introduction. 

3.1 Definition and Properties 

Given an itemset constraint C, we want to extract the collection / of itemsets, 
/ = {A C ^ I C{A)}. Therefore, we want to find in the transposed database a 
collection T of object sets (if it exists) such that the image by / of this collection 
is /, i.e., {/(O) I O G T} = I. Since f{0) is always a closed itemset, this is only 
possible if the collection I contains only closed itemsets (i.e., if the constraint 
C includes the Cdose constraint). In this solution for T is the collection 

{O C O I C(/(0))} which leads to the following definition of a transposed con- 
straint: 

Definition 1 (Transposed constraint). Given a constraint C, we define the 
transposed constraint *C on a closed pattern O of objects as: 

'C{0)^C{f{0)). 

Example 2. Consider the itemset constraint C{A) — (oi G A). Its transposed 
constraint is (by definition) *C(0) = (oi G f{0)). Using the dataset of Tab. [TJ 
the object sets that satisfy *C are T = {01,02,0102,0103,0203,010203}. If we 
compute {/(O) | O € T}, we get {010203, 01020304} which are exactly the closed 
itemsets that satisfy C. Theorem [T] will show that this is always the case. 
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It is interesting to study the effect of transposition w.r.t. the monotonicity 
or anti- monotonicity of constraints, since many mining algorithms rely on them 
for efficient pruning: 

Proposition 1. If a constraint C is monotonic (resp. anti-monotonic) , the trans- 
posed constraint *C is anti-monotonic (resp. monotonic) . 

Proof: / and g are decreasing (Prop. [T]), which inverts monotonicity and 
anti-monotonicity. □ 

Since we also want to deal with complex constraints (i.e., constraints built 
with elementary constraints using boolean operators), we need the following: 

Proposition 2. IfC and C are two constraints then: 

*(CAC') = *CA*C' 

Proof: For the conjunction: *(C A C'){0) = (C A C')(/(0)) = C(/(0)) A 
C'(/(0)) = (*C A *C')(0). The proof is similar for the disjunction and the 
negation. □ 

Many algorithms deal with conjunctions of anti-monotonic and monotonic 
constraints. The two last propositions mean that these algorithms can be used 
with the transposed constraints since the transposed constraint of the conjunc- 
tion of a monotonic and an anti-monotonic constraint is the conjunction of a 
monotonic and an anti-monotonic constraint! The last proposition also helps in 
building the transposition of a composed constraint. It is useful for the algebrai- 
sation |22j of the constraint mining problem, where constraints are decomposed 
in disjunctions and conjunctions of elementary constraints. 

3.2 Transposed Constraints of Some Classical Constraints 

In the previous section, we gave the definition of the transposed constraint. In 
this definition (*C(0) = C(/(0))), in order to test the transposed constraint on 
an object set O, it is necessary to compute f{0) (to come back in the attribute 
space) and then to test C. This means that a mining algorithm using this con- 
straint must maintain a dual context, i.e., it must maintain for each object set O 
the corresponding attribute set f{0). Some algorithms already do this, for in- 
stance algorithms which use the so called vertical representation of the database 
(like CHARM [15]). For some classical constraints however, the transposed con- 
straint can be rewritten in order to avoid the use of f{0). In this section, we 
review several classical constraints and try to find a simple expression of their 
transposed constraint in the object space. 
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Let us first consider the minimum frequency constraint: the transposed con- 
straint of C^.freq(A) = {T{A) > 7) is, by definition [H *C^.freq(0) {T{f{0)) > 
7). By definition of frequency, T{f{0)) = \g{f{0))\ = |cl(0)| and if O is a closed 
object set, cl(0) = O and therefore *C-y_freq 

(O) = (|0| > 7). Finally, the trans- 
posed constraint of the minimum frequency constraint is the "minimum size" 
constraint. The CARPENTER T8] algorithm uses this property and mines the 
closed patterns in a divide-and-conquer strategy, stopping when the length of 
the object set drops below the threshold. 

The next two propositions give the transposed constraints of two other clas- 
sical constraints : the subset and superset constraints: 

Proposition 3 (subset constraint transposition). LetCcE be the constraint 
defined by: Cc_e(^) — (A C E) where E is a constant itemset. Then if E is closed 
(O is an object set): 

'CcE{0)^g{E)Cc\{0) 

and if E is not closed 

'CcE{0)^g{E)Cc\{0). 

Proof: *Ccb(0) ^ Ccb(/(0)) ^ (/(O) C E) ^ {g{E) C g{f{0))) 4^ 
{g{E) C cl(0)). Conversely (if E is closed): {g{E) C g{f{0))) (/(O) C 
c\{E)) ^ (/(O) C E). □ 

Proposition 4 (superset constraint transposition). Let Cde be the con- 
straint defined by: Cd£;(j4) — (AD E) where E is a constant itemset. Then: 

*C2EiO)^g{E)Dc\{0). 

Proof: 'C{0) ^ {E C f{0)) ^ {g{f{0)) C g{E)) ^ (cl(0) C g{E)). 
Conversely, {g{f{0)) C g{E) ^ {fg{E) C fgf{0)) ^ fg{E) C f{0) 
cKE)Cf{0)^ECf{0). □ 

These two syntactical constraints are interesting because they can be used 
to construct many other kind of constraints. In fact, all syntactical constraints 
can be build on top of these using conjunctions, disjunctions and negations. 
With the proposition [21 it is then possible to compute the transposition of many 
syntactical constraints. Besides, these constraints have been identified in |13l4j 
to formalize dataset reduction techniques. 

Table [2] gives the transposed constraints of several classical constraints if 
the object set O is closed (this is not an important restriction since we will 
use only closed itemsets extraction algorithms). These transposed constraints 
are easily obtained using the two previous propositions on the superset and the 
subset constraints and Prop. [2l For instance, if C{A) — {An E ^ this can be 
rewritten A % E [E denotes the complement of iJ, i.e. A\E) and then ^{A C E). 
The transposed constraint is therefore, using Prop. [5] and O ^{g{E) C O) (if E 
is closed) and finally g{E) % O.li E is not closed, then we write E = {ei, e„} 
and we rewrite the constraint C{A) = (ei G AW e2 G A V ... V e„ € A) and 
then, using Prop. [2] and [H we obtain the transposed constraint *C(0) = (O C 
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Table 2. Transposed constraints of some classical constraints. A is a variable 
closed itemset, E = {ei, 62, e„} a constant itemset, O a variable closed object 
set and ^ = ^ \ S = {/i, /a, /,„} 



Itemset constraint C{A) Transposed constraint 'C(O) 



T{A)6a 


\o\ea 


ACE 


if E is closed: C O 




else: O^gifi) A... AO ^g{fm.) 


EC A 


C g{E) 


A%E 


if E is closed: g{E) O 




else: O C V ... V O C 


E^A 




Ar\E = % 


if £ is closed: g{E) C O 




else: O g ^(ei) A ... A O g p(e„) 


Av\Ei^% 


if is closed: g{E) % O 




else: O C gi(ei) V ... V O C 3(e„) 


SUM(yl)ea 




MIN(yl)6ia 


see text 


MAX(yl)6ia 


see text 



{<,>,<,>} 



(^(ei) V ... V O C 5(e„)). These expressions are interesting since they do not 
involve the computation of f{0). Instead, there are g{E) or g{ei) ... However, 
since E is constant, these values need to be computed only once (during the first 
database pass, for instance). 

Example 3. We show in this example how to compute the transposed constraints 
with the database of Tab. [1] Let the itemset constraint C{A) = (Anaia4 ^ 0). In 
the database of Tab. [TJ the itemset alaj — 0203 is closed. Therefore, the trans- 
posed constraint is (Tab. [2]) *C(0) = {g{a2aj,) % O). Since 3(0203) = 0102O3, 
*C(0) — (0102O3 % O). The closed object sets that satisfy this constraint are T = 
{0,0102,03}. If we apply / to go back to the itemset space: {/(O) | O G T} = 
{01020304,010203,020304} which are, as expected (and proved by Th. [T|), the 
closed itemset which satisfy C. 

Consider now the constraint C{A) = (Anoi02 / 0). In this case, 0102 = 0304 
is not closed. Therefore, we use the second expression in Tab. [2] to compute 
its transposition. *C(0) — {O C g{ai) V O C 5(02)). Since g(oi) = 01O2 and 
5(02) = 01O2O3, *C(0) — (O C o\02 V O C 01O2O3) which can be simplified in 
*C(0) — {p C 01O2O3). All the closed object sets satisfy this constraint *C, which 
is not surprising since all the closed itemsets satisfy C. 

Our last example is the constraint C{A) = (|A 0010204! > 2). It can be 
rewritten C{A) = ((01O2 C ^) V (0104 C A) V (02O4 C A)). Using Prop. [21 and 
Tab. m we get *C(0) = ((O C 5(0102)) V (O C 5(0104)) V (O C 5(0204))) which 
is *C(0) = ((O C 0102) V (O C 0) V (O C 03)). The closed object sets satisfying 
*C are T = {0, 01O2, 03} and {/(O) | O G T} = {01020304, 01O2O3, 02O3O4}. 
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Other interesting constraints include aggregate constraints ^16j . If a numerical 
value a.v is associated to each attribute a ^ A, we can define constraints of the 
form SUM(A) 9 a for several aggregate operators such as SUM, MIN, MAX or 
AVG, where 9 G {<,>,<,>} and a is a numerical value. In this case, SUM(A) 
denotes the sum of all a.v for all attributes a in A. 

The constraints MIN(A) 9 a and MAX{A) 9 a are special cases of the con- 
straints of Tab. [21 For instance, if supa = {a G A \ a.v > a} then MIN(^) > a 
is exactly A C supa and MIN(A) < a is A ^ supa. The same kind of rela- 
tion holds for MAX operator: MAX(A) > a is equivalent to A n supa and 
MAX(A) < a is equivalent to A n supa = 0. In this case, since a is a constant, 
the set supa can be pre-computed. 

The constraints AVG{A) 9 a and S\JM{A) 9 a arc more difficult. Indeed, we 
only found one expression (without f{0)) for the transposition of SUM(A) 9 a. 
Its transposition is *C(0) = (SUM(/(0)) 9 a). In the database, f{0) is a set of 
attributes, so in the transposed database, it is a set of rows and O is a set of 
columns. The values a.v are attached to rows of the transposed database, and 
SUM(/(0)) is the sum of these values for the rows containing O. Therefore, 
SUM(/(0)) is a pondered frequency of O (in the transposed database) where 
each row a, containing O, contributes for a.v in the total (we denote this pon- 
dered frequency by J^p{0)). It is easy to adapt classical algorithms to count this 
pondered frequency. Its computation is the same as the classical frequency ex- 
cept that each row containing the counted itemset does contribute with a value 
different from 1 to the frequency. 

4 Closed Itemsets Mining 

In a previous work [23] we showed the complementarity of the set of concepts 
mined in the database, with constraining the attribute patterns, and the set 
of concepts mined in the transposed database with the negation of the trans- 
posed constraint, when the original constraint is anti-monotonic. The transposed 
constraint had to be negated in order to ensure the anti-monotonicity of the 
constraint used by the algorithm. This is important because we can keep usual 
mining algorithms which deal with anti-monotonic constraint and apply them 
in the transposed database with the negation of the transposed constraint. We 
also showed |24j a specific way of mining under monotonic constraint, by sim- 
ply mining the transposed database with the transposed constraint (which is 
anti-monotonic). In this section, we generalize these results for more general 
constraints. 

We define the constrained closed itemset mining problem: Given a 
database db and a constraint C, we want to extract all the closed itemsets (and 
their frequencies) satisfying the constraint C in the database db. More formally, 
we want to compute the collection: 

{{A, :F{A, db)) I C{A, db) A Cciose(A, db)] . 
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The next theorem shows how to compute the above solution set using the 
closed object patterns extracted in the transposed database, with the help of the 
transposed constraint. 

Theorem 1. 

{A I C(A) ACdose(A)} = {/(O) I *C(0) ACdose(O)} . 

Proof: Bydcf.m {/(O) | *C(0) A Cdose(O)} = {/(O) | C(/(0)) A Cdose(O)} = 
{A I 30 s.t. C{A) ^A = f{0)} = {A \ C{A) A Cdose(A)}- □ 

This theorem means that if we extract the collection of all closed object 
patterns satisfying *C in the transposed database, then we can get all the closed 
patterns satisfying C by computing f{0) for all the closed object patterns. The 
fact that we only need the cZosed object patterns and not all the object patterns is 
very interesting since the closed patterns are less numerous and can be extracted 
more efficiently (see CHARM (28,, CARPENTER 18 , CL0SET[2T] or [7]). The 
strategy, which we propose for computing the solution of the constraint closed 
itemset mining problem, is therefore: 

1. Compute the transposed constraint *C using Tab. [2] and Prop. [2l This step 
can involve the computation of some constant object sets g{E) used in the 
transposed constraint. 

2. Use one of the known algorithms to extract the constrained closed sets of 
the transposed database. Most closed set extraction algorithms do not use 
constraints (hke CLOSE, CLOSET or CARPENTER). However, it is not 
difficult to integrate them (by adding more pruning steps) for monotonic or 
anti-monotonic constraints. In "T, another algorithm to extract constrained 
closed sets is presented. 

3. Compute /(O) for each extracted closed object pattern. In fact, every algo- 
rithm already computes this when counting the frequencjQ of O, which is 
\f{0)\. The frequency of f{0) (in the original database) is simply the size 
of O and can therefore be provided without any access to the database. 

The first and third steps can indeed be integrated in the core of the mining 
algorithm, as it is done in the CARPENTER algorithm (but only with the 
frequency constraint). 

Finally, this strategy shows how to perform constrained closed itemset min- 
ing by processing all the computations in the transposed database, and using 
classical algorithms. 

5 Itemsets Mining 

In this section, we study how to extract all the itemsets that satisfy a user 
constraint (and not only the closed ones). We define the constrained itemset 
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mining problem : Given a database db and a constraint C, we want to extract 
all the itemsets (and their frequencies) satisfying the constraint C in the database 
db. More formally, we want to compute the collection: 

{{A,T{A,db))\C{A,db)}. 

In the previous section, we gave a strategy to compute the closed itemsets 
satisfying a constraint. We will of course make use of this strategy. Solving the 
constrained itemset mining problem will involve three steps : Given a database 
db and a constraint C, 

1. find a constraint C, 

2. compute the collection {{A, T{A, db)) \ C'{A, db) A Ccio5e(^, db)} of closed sets 
satisfying C using the strategy of Sec. IH 

3. compute the collection {{A, !F{A, db)) \ C{A, db)} of all the itemsets satisfying 
C from the closed ones satisfying C. 

We will study the first step in the next subsection and the third one in 
Sec. 15. 2[ but first we will show why it is necessary to introduce a new constraint 
C. Indeed, it is not always possible to compute all the itemsets that satisfy C 
from the closed sets that satisfy C. Let us first recall how the third step is done 
in the classical case where C is the frequency constraint [19] : 

The main used property is that all the itemsets of an equivalence class have 
the same frequency than the closed itemset of the class. Therefore, if we know 
the frequency of the closed itemsets, it is possible to deduce the frequency of 
non-closed itemsets provided we are able to know in which class they belong. 
The regeneration algorithm of [19j use a top down approach. Starting from the 
largest frequent closed itemsets, it generates their subsets and assign them their 
frequencies, until all the itemsets have been generated. 

Now, assume that the constraint C is not the frequency constraint and that we 
have computed all the closed itemsets (and their frequencies) that satisfy C. If an 
itemset satisfies C, it is possible that its closure does not satisfies it. In this case, 
it is not possible to compute the frequency of this itemset from the collection 
of the closed itemsets that satisfy C (this is illustrated in Fig. ^ . Finally, the 
collection of the closed itemsets satisfying C is not sufficient to generate the 
non-closed itemsets. In the next section, we show how the constraint C can be 
relaxed to enable the generation all the non-closed itemsets satisfying it. 

5.1 Relaxation of the Constraint 

In order to be able to generate all the itemsets from the closed ones, it is necessary 
to have at least the collection of closed itemsets of all the equivalence classes 
that contain an itemset satisfying the constraint C. This collection is also the 
collection of the closures of all itemsets satisfying C : {c\{A) \ C{A, db)}. 

We must therefore find a constraint C such that {cl(A) | C{A, db)} is included 
in {A I C'{A,db) ACcio5e(^)}- We call such a C constraint a good relaxation 
of C (see Fig. |3]). If we have an equality instead of the inclusion, we call C an 
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Fig. 2. The dots represent itemsets, the x arc closed itcmscts, the hnes enclose 
the equivalence classes. The itemsets inside the region delimited by the dashed 
line satisfy the constraint C and the others do not. The closed sets satisfying C 
are the closed sets of classes 3, 4 and 5. They will enable; to generate the itemsets 
of these three classes. However, to get the two itemsets of class 2, we need the 
closed itemset of this class which does not satisfy C. Therefore, in this case, 
having the closed itemsets satisfying C is not enough to generate all itemsets 
satisfying C. 
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optimal releLxation of C. For example, the constant "true" constraint (which 
is true on all itemset) is a good relaxation of any constraint, however it is not 
very interesting since it will not provide any pruning opportunity during the 
extraction of step 2. 



Fig. 3. An optimal relaxation of C. The constraint C is represented by the solid 
line and an optimal relaxation is represented by the dashed line. 

If the closed itemsets (and their frequencies) satisfying an optimal relaxation 
of C are computed in step 2, we will have enough information for regenerating 
all itemsets satisfying C in step 3. However it is not always possible to find such 
an optimal relaxation. In this case, we can still use a good relaxation in step 2. 
In this case, some superfluous closed itemsets will be present in the collection 
and will have to be filtered out in step 3. 

We will now give optimal relaxation for some classical constraints, and we 
start with two trivial cases : 

Proposition 5. The optimal relaxation of a monotonic constraint is the con- 
straint itself and the optimal relaxation of the frequency constraint is the fre- 
quency constraint itself. 

Proof: Let C be a monotonic constraint or a frequency constraint. We only 
have to prove that if an itemset A satisfy C then cl(A) also. If C is monotonic, 
this is true since S C cl(S') (Prop.[TJ If C is a minimum frequency constraint, 
it is true because A and c\{A) have the same frequency. □ 

The next proposition is used to compute the relaxation of a complex con- 
straint from the relaxation of simple constraints. 
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Proposition 6. // Ci and C2 are two constraints and C[ and C'2 are optimal 
relaxation of them, then : 

— C'lV C2 is an optimal relaxation of Ci V C2 and 

— C'l A C'2 is a good relaxation of Ci A C2 • 

Proof: A constraint C is a good relaxation of a constraint C if and only if 
yA,C{A) ^ C'{c\{A)). To prove that it is an optimal relaxation, we must 
also prove that if A is closed and satisfies C then there exists an itemset B 
satisfying C such that c\{B) = A {ci. definitions). We will use this two facts 
in our proofs. 

Let A be an itemset satisfying Ci AC2. This means that A satisfies Ci and €2- 
Therefore, c\{A) satisfies C[ and Cj, i.e., d{A) satisfies A C2.This means 
that C[ A C2 is a good relaxation of Ci A C2 . 

We can prove similarly that C[ V C2 is a good relaxation of Ci V C2. Let us 
now prove that it is optimal: Let A be a closed itemset satisfying C'l V C'2. 
Then A satisfies C'l or C'2, suppose that it satisfies C'^. Since C'l is an optimal 
relaxation of Ci, there exists B satisfying Ci such that c\{B) — A. Therefore 
B satisfies Ci V C2 and c\{B) ^ A. □ 

We found no relaxation for the negation of a constraint but this is not a 
problem. If the constraint is simple (i.e., in Tab. ^ its negation is also in the 
table and if it is complex, then we can "push" the negation into the constraint 
as shown in the next example. 

Example 4. Let C(A) = {^{{{T{A) > 5) A {A <^ E)) \J {A n F = 0))) where 
E and F are two constant itemsets. We can push the negation and we get: 
C{A) = {{^{T{A) > 3) V -(A ^ E)) A -(A n F = 0)), and finally : 

C{A) = {{{TiA) < 3) V (A C £;)) A (A n F ^ 0)). 

Then with Prop. [5l [6] and Tab. [3l we can compute a good relaxation C of C: 

C'iA) = {{{TiA) < 3) V (A C c\{E))) AiAnF^ 0)). 

Table |3] gives good relaxation of the other constraints of Tab. [2] which are 
not covered by the previous proposition (i.e., which are not monotonic) except 
for the non-monotonic constraints involving SUM for which we did not find any 
interesting (i.e., other than the constant true constraint) good relaxation. 

Proof: We prove here the results given in Tab. [3l 

C{A) = {AC E), C'{A) = {AC c\(E)): UACE then z\{A) C cl(£;). This 
means that C{A) ^ C'{c\{A)) therefore C is a good relaxation of C. 

C{A) ^ {A n E ^ %): C can be rewritten C{A) = (A C F) and the previous 
case applies with E instead of E. 
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Table 3. Good relaxation of some classical constraints. A is a variable closed 
itemset, E — {ei,e2, ...,e„} a constant itemset. 

Itemset constraint C{A) Good relaxation C'{A) 

Jc~B A C c\{E) 

Ei^A 4 C cl(er) Vyl C cl(ei) V ... V A C cl(e;:) 

AnE = (!) ACc\(E) 
MIN(A) > a AC c\{supc,) 

MAX(^) < a AC c\{supeqa) 



C{A) = {E % A): HE — {ei, 62, e„}, this constraint can be rewritten 
{ei} % Ay {€2} % Ay. . .V{e„} ^ A which is also A C {ejv. . .VA C {e„}. 
Then the first case and Prop [6] give the result. 

C{A) = (MIN(yl) > a) and C{A) = (MAX(A) < a): C{A) = (MIN(A) > a) 
can be rewritten A C supa with supa = {a G A \ a.v > a} and we are in 
the first case. C{A) — {MAX{A) < a) can be rewritten ACisupeqa = with 
supeqa = {a C A \ a.v > a} and we are in the second case. □ 

5.2 Regeneration 

Given a database db and a constraint C, we suppose in this section that a col- 
lection {{A, J- {A, db)) I C'{A, db) A Cciose(^, db)} of closed itemsets (and their fre- 
quencies) satisfying a good relaxation C of C is available. The aim is to compute 
the collection {{A,T{A,db)) \ C{A,db)} of all itemset satisfying C (and their 
frequencies) . 

If C is a minimum frequency constraint, C is an optimal relaxation of it- 
self, therefore we take C = C. The regeneration algorithm is then the classical 
algorithm 6 of [19]. We briefly recall this algorithm: 

We suppose that the frequent closed itemsets (and their frequencies) of size 
i are stored in the list Ci ior < i < k where k is the size of the longest frequent 
closed itemset. At the end of the algorithm, each Ci contains all the frequent 
itemsets of size i and their frequencies. 

1 for (i = fc; i > 0; i ) 

2 forall Ae C^ 

3 forall subset B of size (« — 1) oi A 

4 if S A_i 

5 B.freq = A.freq 

6 A-i = A-i U {B} 

7 endif 

8 end 

9 end 

10 end 
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If C is not the frequency constraint, this algorithm generates all the subsets 
of the closed itemsets satisfying C and two problems arise: 

1. Some of these itemsets do not satisfy C. For instance, in Fig. [3l all the 
itemsets of classes 2, 3, 4, 5 and 6 are generated (because they are subsets of 
closed itemsets that satisfy C) and only those of classes 3 and 4 and some 
of classes 2 and 5 satisfy C. 

2. The frequency computed in step 5 of the above algorithm for B is correct 
only if the closure of B is in the collection of the closed sets at the beginning 
of the algorithm. If it is not, then this computed frequency is smaller than 
the true frequency of B. In Fig. [3l this means that the computed frequency 
of the itemsets of class 6 are not correct. 

However, the good news is that all the itemsets satisfying C are generated 
(because C is a good relaxation of C) and their computed frequencies are correct 
(because their closures belongs to the Ci at the beginning). 

A last filtering phase is therefore necessary to filter out all the generated 
itemsets that do not satisfy C. This phase can be pushed inside the above gen- 
eration algorithm if the constraint C has good properties (particularly if it is a 
conjunction of a monotonic part and an anti-monotonic one). However, we will 
not detail this point here. 

We are still facing a last problem: to test C{A), we can need J- [A). However, 
if C{A) is false, it is possible that the computed frequency of A is not correct. 
To solve this problem, we propose the following strategy. 

We assume that the constraint C is a Boolean formula built using the atomic 
constraints listed in Tab.[2]and using the two operators A and V (if the -> operator 
appears, it can be pushed inside the formula as shown in Ex.[¥]). Then, we rewrite 
this constraint in disjunctive normal form (DNF), i.e., C = Ci VC2 V . . . VC„ with 
Ci — Ami^x+i A ... A Arm where each Ai is a constraint listed in Tab. [H 

Now, consider an itemset A whose computed frequency is / (with / < J- {A)). 
First, we consider all the conjunction Ci that we can compute, this include those 
where J-{A) does not appear and those of the form J-{A) > a or J-{A) < a 
where a < / (in this two cases we can conclude since J- (A) > /). If one of them 
is true, then C{A) is true and A is not filtered out. 

If all of them are false, we have to consider the remaining conjunctions of the 
form ^1 A ... A {^{A) > a) A . . . with a > /. If one of the Ai is false, then the 
conjunction is false. If all are true, we suppose that ^{A) > a: in this case C{A) 
is true and therefore T{A) = f which contradict a > f. Therefore, J^{A) > a is 
false and also the whole conjunction. 

If it is still impossible to answer, it means that all the conjunctions are false, 
and that there are conjunction of the form ^1 A ... A {J- (A) < a) A . . . with 
^ /■ In this case, it is not possible to know if C{A) is true without computing 
the frequency ^{A). 

Finally, all this means that if there is no constraints of the form ^{A) < a 
in the DNF of C, we can do this last filtering phase efficiently. If it appears, then 
the filtering phase can involve access to the database to compute the frequency 
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of some itemsets. Of course, all these frequency computation should be made in 
one access to the database. 

Example 5. In this example, we illustrate the complete process of the resolution 
of the constrained itemset mining problem on two constraints (we still use the 
dataset of Tab. [J): 

C{A) = {{T{A) > 1) V (ai G A)). 
This constraint is its own optimal relaxation (cf. Prop. [5] and [6|). According to 
Tab. m and Prop. H its transposed constraint is *C(0) = ((|0| > 1) V (O C 
g(ai))) and g{ai) — 01O2. The closed objects sets that satisfy this constraints 
are T = {0102,010203,0}. If we apply / to go back to the itemset space: 
{/(O) I O e T} ~ {01020304,010203,0203}. Since this set contains 01O2O3O4, 
all the itemsets are generated. However, the generated frequency for the item- 
sets of the class of 02O3O4 is 0. The other generated frequencies are correct. C 
is in DNF with two simple constraints {J- [A] > 1) and (oi G A). During the 
filtering step, when considering the itemsets of 020304's class, the second con- 
straint is always true. Since the generated frequency / is and a is 1, a > / and 
therefore these itemsets must be filtered out. Finally, the remaining itemsets are 
exactly those that satisfy C. 

C{A)^{{T{A)>l)h{A(Za2ai)). 
A good relaxation of C is C'{A) = {{T{A) > 1) A (A C 01(0204))) = {{J'iA) > 
1) A (A C 02O3O4)). The corresponding transposed constraint is *C (O) = ((|0| > 
1) A (5(020304) C O)) = ((|0| > 1) A(o3 C O)) since 02O3O4 is closed. The closed 
objects sets that satisfy this constraints arc T = {01O2O3}. If we apply / to go 
back to the itemset space: {/(O) | O G T} ~ {02O3}. Then all the subsets of 
02O3 are generated and only and 02 remains after the filtering step. 

6 Conclusion 

In order to mine constrained closed patterns in databases with more columns 
than rows, we proposed a complete framework for the transposition: we gave 
the expression in the transposed database of the transposition of many classical 
constraints, and showed how to use existing closed set mining algorithms (with 
few modifications) to mine in the transposed database. 

Then we gave a strategy to use this framework to mine all the itemset satisfy- 
ing a constraint when a constrained closed itemset mining algorithm is available. 
This strategy consists of three steps: generation of a relaxation of the constraint, 
extraction of the closed itemset satisfying the relaxed constraint and, finally, gen- 
eration of all the itemsets satisfying the original constraint. 

We can therefore choose the smallest space between the object space and the 
attribute space depending on the number of rows/columns in the database. Our 
strategy gives new opportunities for the optimization of mining queries (also 
called inductive queries) in contexts having a pathological size. This transposi- 
tion principle could also be used for the optimization of sequences of queries: the 
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closed object sets computed in the transposed database during the evaluation of 
previous queries can be stored in a cache and be re-used to speed up evaluation 
of new queries in a fashion similar to [15j . 
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